Performance of Error Estimators for Classification

نویسندگان

  • Edward R. Dougherty
  • Chao Sima
  • Jianping Hua
  • Blaise Hanczar
  • Ulisses M. Braga-Neto
چکیده

Classification in bioinformatics often suffers from small samples in conjunction with large numbers of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias, or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied, and the sample size. This paper reviews the performance of training-sample error estimators with respect to several criteria: estimation accuracy, variance, bias, correlation with the true error, regression on the true error, and accuracy in ranking feature sets. A number of error estimators are considered: resubstitution, leave-one-out cross-validation, 10-fold cross-validation, bolstered resubstitution, semi-bolstered resubstitution, .632 bootstrap, .632+ bootstrap, and optimal bootstrap. It illustrates these performance criteria for certain models and for two real data sets, referring to the literature for more extensive applications of these criteria. The results given in the present paper are consistent with those in the literature and lead to two conclusions: (1) much greater effort needs to be focused on error estimation, and (2) owing to the generally poor performance of error estimators on small samples, for a conclusion based on a small-sample error estimator to be considered valid, it should be supported by evidence that the estimator in question can be expected to perform sufficiently well under the circumstances to justify the conclusion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Ratio-type Estimators of Variance with Minimum Average Square Error

The ratio-type estimators have been introduced for estimating the mean and total population, but in recent years based on the ratio methods several estimators for population variance have been proposed. In this paper two families of estimators have been suggested and their approximation mean square error (MSE) have been developed. In addition, the efficiency of these variance estimators are com...

متن کامل

Ridge Stochastic Restricted Estimators in Semiparametric Linear Measurement Error Models

In this article we consider the stochastic restricted ridge estimation in semipara-metric linear models when the covariates are measured with additive errors. The development of penalized corrected likelihood method in such model is the basis for derivation of ridge estimates. The asymptotic normality of the resulting estimates are established. Also, necessary and sufficient condition...

متن کامل

Comparison of Small Area Estimation Methods for Estimating Unemployment Rate

Extended Abstract. In recent years, needs for small area estimations have been greatly increased for large surveys particularly household surveys in Sta­ tistical Centre of Iran (SCI), because of the costs and respondent burden. The lack of suitable auxiliary variables between two decennial housing and popula­ tion census is a challenge for SCI in using these methods. In general, the...

متن کامل

Stochastic Restricted Two-Parameter Estimator in Linear Mixed Measurement Error Models

In this study, the stochastic restricted and unrestricted two-parameter estimators of fixed and random effects are investigated in the linear mixed measurement error models. For this purpose, the asymptotic properties and then the comparisons under the criterion of mean squared error matrix (MSEM) are derived. Furthermore, the proposed methods are used for estimating the biasing parameters. Fin...

متن کامل

Generalized Family of Estimators for Imputing Scrambled Responses

When there is a high correlation between the study and the  auxiliary variables, the rank of the auxiliary variable also correlates with the study variable. Then, the use of the rank as an additional auxiliary variable may be helpful to increase the efficiency of the estimator of the mean or total of the population.   In the present study, we propose two  generalized familie...

متن کامل

Optimal mean-square-error calibration of classifier error estimators under Bayesian models

A recently proposed Bayesian modeling framework for classification facilitates both the analysis and optimization of error estimation performance. The Bayesian error estimator is then defined to have optimal mean-square error performance, but in many situations closed-form representations are unavailable and approximations may not be feasible. To address this, we present a method to optimally c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008